Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification

نویسندگان

Jingbo Zhu

Huizhen Wang

Eduard H. Hovy

چکیده

In this paper, we address the problem of knowing when to stop the process of active learning. We propose a new statistical learning approach, called minimum expected error strategy, to defining a stopping criterion through estimation of the classifier’s expected error on future unlabeled examples in the active learning process. In experiments on active learning for word sense disambiguation and text classification tasks, experimental results show that the new proposed stopping criterion can reduce approximately 50% human labeling costs in word sense disambiguation with degradation of 0.5% average accuracy, and approximately 90% costs in text classification with degradation of 2% average accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem

In this paper, we analyze the effect of resampling techniques, including undersampling and over-sampling used in active learning for word sense disambiguation (WSD). Experimental results show that under-sampling causes negative effects on active learning, but over-sampling is a relatively good choice. To alleviate the withinclass imbalance problem of over-sampling, we propose a bootstrap-based ...

متن کامل

Clinical Word Sense Disambiguation with Interactive Search and Classification

Resolving word ambiguity in clinical text is critical for many natural language processing applications. Effective word sense disambiguation (WSD) systems rely on training a machine learning based classifier with abundant clinical text that is accurately annotated, the creation of which can be costly and time-consuming. We describe a double-loop interactive machine learning process, named ReQ-R...

متن کامل

A literature survey of active machine learning in the context of natural language processing

Active learning is a supervised machine learning technique in which the learner is in control of the data used for learning. That control is utilized by the learner to ask an oracle, typically a human with extensive knowledge of the domain at hand, about the classes of the instances for which the model learned so far makes unreliable predictions. The active learning process takes as input a set...

متن کامل

The learning vector quantization algorithm applied to automatic text classification tasks

Automatic text classification is an important task for many natural language processing applications. This paper presents a neural approach to develop a text classifier based on the Learning Vector Quantization (LVQ) algorithm. The LVQ model is a classification method that uses a competitive supervised learning algorithm. The proposed method has been applied to two specific tasks: text categori...

متن کامل

Partially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora

Supervised and semi-supervised sense disambiguation methods will mis-tag the instances of a target word if the senses of these instances are not defined in sense inventories or there are no tagged instances for these senses in training data. Here we used a model order identification method to avoid the misclassification of the instances with undefined senses by discovering new senses from mixed...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification

نویسندگان

چکیده

منابع مشابه

Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem

Clinical Word Sense Disambiguation with Interactive Search and Classification

A literature survey of active machine learning in the context of natural language processing

The learning vector quantization algorithm applied to automatic text classification tasks

Partially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora

عنوان ژورنال:

اشتراک گذاری